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The two-grid algorithm confronts a shifted unitary orthogonal method 
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In this paper I describe a new optimal Krylov subspace solver for shifted unitary matrices called the Shifted 
Unitary Orthogonal Method (SUOM). This algorithm is used as a benchmark against any improvement like the 
two-grid algorithm. I use the latter to show that the overlap operator can be inverted by successive inversions of 
the truncated overlap operator. This strategy results in large gains compared to SUOM. 



It is well-known that overlap fermions [1] lead 
to much more expensive computations than stan- 
dard fermions, i.e. Wilson or Kogut-Sussking 
fermions. This is obvious since for every applica- 
tion of the overlap operator an extra linear system 
solving is needed. For the time being, it seems 
that to get chiral symmetry at finite lattice spac- 
ing one should wait for a Petaflops computer be- 
ing built. 

However, algorithmic research is far from ex- 
hausted. In this paper I give an example that 
this is the case if one uses the two-grid algorithm 
[2]. Before I do this, I introduce briefly an op- 
timal Krylov subspace solver for shifted unitary 
matrices. 

1. SUOM: A NEW OPTIMAL KRYLOV 
SOLVER 

Consider the task of solving the linear system: 



Dx = b, D = Cl t + c 2 V 



(1.1) 



where V — 7ssign(iJw) is a unitary matrix, 11 
the identity matrix, Hyy the Hermitian Wilson 
operator, c\ = (1 + m q )/2,C2 = (1 — m q )/2 
and m q the bare fermion mass. The overlap 
operator D is non-Hermitian. For such opera- 
tors GMRES (Generalised Minimal Residual) and 
FOM (Full Orthogonalisation Method) are known 
to be the fastest. It is shown that when the 
norm-minimising process of GMRES is converg- 
ing rapidly, the residual norms in the correspond- 



ing Galerkin process of FOM exhibit similar be- 
haviour [3]. But they are based on long recur- 
rences and thus require to store a large number 
of vectors of the size of matrix columns. However, 
exploiting the fact that the overlap operator is a 
shifted unitary matrix one can construct a GM- 
RES type algorithm with short recurrences [4]. 

Similarly, a short recurrences algorithm can be 
obtained from FOM. The method is based on 
an observation of Rutishauser [5] that for up- 
per Hessenberg unitary matrices one can write 
H = LU -1 , where L and U are lower and upper 
bidiagonal matrices. Applying this decomposi- 
tion for the Arnoldi iteration: 



VQk = QkHk + hk+i,k1k+ie k 



(1.2) 



one obtains an algorithm which constructs 
Arnoldi vectors Qk by short recurrences [6]: 

VQkUk = QkL k + Zfc+i,fc<?fe+i e fe • (I- 3 ) 

Projecting the linear system (1.1) onto the Krylov 
subspace one gets: 

(cilfc + c 2 L k U k 1 )y k = ei (1.4) 

which can be equivalently written as: 

(ciU k + c 2 Lk)z k = ei, y k = U k z k - (1.5) 

Note that the matrix on the left hand side is 
tridiagonal. It can be shown that one can solve 
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this system and therefore the original system us- 
ing short recurrences [6] . The resulting algorithm 
is called the Shifted Unitary Orthogonal Method 
(SUOM) and is given below: 

Algorithm 1 SUOM algorithm 

p = \\b\\ 2 ; qo = b/p; w = qo 
loo = qoVq 
q = Vq - l 00 q 

{w = \\q\W, qi = q/ho 

ho — c l + C2I0Q 

ao = p/loo; x = a w : r = b - a Dw 
for k = 1,2,... do 

Uk-ik = -qk-i v ik/qk-i v Qk-i 
hk = qj^Vqk + Uk-ikqk v qk-i 

q = (V - hk)qk + Uk-ikVqk-i 

lk+lk = H9II2 

qk+i = q/h+ik 

Ikk = Ci + C 2 lkk — C\Cllkk-\Uk-\kllk-\k-\ 
OLk = ~C2lkk-l/lkkCtk-l 

Wk = qk + Uk-lkqk-l - ClMfc-lfc /Ik-lk-lWk-l 

x k = x k -i + a k w k 
rk = r-fe-i - a k Dw k 
Stop if ||r fc || 2 < tol p 
end for 



Note that in an actual implementation one can 
store Vqk and Dwk as separate vectors, which can 
be used in the subsequent iteration to compute 
Dwk+i- Therefore only one multiplication by V 
is needed at each step. 

2. THE TWO-GRID ALGORITHM 

A straightforward application of multigrid al- 
gorithms is hopeless in the presence of non- 
smooth gauge fields. However, the situation is 
different for the 5-dimensional formulation of chi- 
ral fcrmions where there are no gauge connections 
along the fifth dimension. Here, I will limit my 
discussion in the easiest case which consists of 
two grids: the "fine" grid, which is the contin- 
uum along the fifth coordinate and a coarse grid, 
which is the lattice discretisation of the "fine" 
grid. 

I define chiral fermions on the coarse grid using 
truncated overlap fermions [7]. The correspond- 
ing 5-dimensional matrix Ai in blocked form is 



given by: 

/ D w - 1 {D w + 1)P+ ~m q {D w + l)P-\ 
{D W + 1)P- 

\-m q (D w + T)P+ D w -1 J 

where P± = (I4 + 7 5 )/2. Let Mi be the above 
matrix but with bare quark mass m q = 1 and P 
the permutation matrix: 

/P + P \ 
P+ '•• 

'•• P- 
\P- P+J 

It can be shown that the following result hold 
[2,8]: 

Proposition 2.1 Let P T M^ 1 MPx = V be the 

linear system defined on the 5-dimensional lat- 
tice with x = (2/: X^ 2 \ ■ ■ ■ ! X^'^y an d V = 
(r, o, . . . , o) T . Then y is the solution of the lin- 
ear system D^ N -^y = r, where D^ N ^ — > D as 
N 5 -> 00. 

This result lends itself to a special two-grid al- 
gorithm [2,8]. Indeed, x$ = 05 is the (fifth Eu- 
clidean) coordinate of interest since it contains 
the information about the 4-dimensional physics. 
One way of exploiting this is to use "decimation" 

Algorithm 2 The two-grid algorithm 

xi eC N ; n=b- Dxi; tol,tol G M+ 
for i = 1,2,... do 

Let^ = (r 4 , O ,..., ) T eC Ar ^ 

Let Xi + i = (»i + i,Xa ) i,--,xffl ) ) r eC^ 

Solve MPxi+i = MiPrji until 

\\MiPth - MP X i+i\\2 < tol \\MiPr)i\\ 2 

Xi+i = Xi + y i+ i 

n+i = b- Dx i+ i 

Stop if |r l+ i|| 2 < tol ||6|| 2 
end for 

over the fifth coordinate in order to get the 5d- 
vector 77. Using proposition 2.1 one can evaluate 
directly the first 4d-component of -q by r — b—Dx, 
x being an approximate solution. The rest can be 
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padded with zero 4d-vectors. The second step is 
to solve the problem on the coarse grid. Finally, 
one can extract the 4d-solution y on this grid and 
correct the "fine" grid solution by a; <— x + y. 
In the second cycle one has to repeat the same 
decimation method, since the "fine" 5d-operator 
is not available. Hence, the whole scheme is a 
restarted two-grid algorithm, which is given here 
as Algorithm 2. 

3. COMPARISON OF METHODS 

In Fig. 1 is shown the convergence of vari- 
uos algorithms as a function of Wilson matrix- 
vector multiplication number on a fixed gauge 
background on a 8 3 16 lattice at j3 = 5.7. The 
convergence is measured using the norm of the 
residual error. For the overlap matrix- vector mul- 
tiplication is used the double pass Lanczos al- 
gorithm (without small eigenspace projection of 
H\y) as described in [9]. Together with the al- 
gorithms described in the previous sections Fig. 
1 shows the performance of Conjugate Residuals 
(CR), Conjugate Gradients on Normal Equations 
(CGNE) and CG-CHI. The latter is the CGNE 
which solves simultaneously the decoupled chi- 
ral systems appearing in the matrix D D. One 
can observe a gain over CGNE which may be ex- 
plained due to the reduced number of eigenval- 
ues at each chiral sector. However, this gain is 
no more than 10%. On the other hand SUOM 
and CR preform rather similar with SUOM be- 
ing slightly faster in this scale. The gain over 
CGNE is about a factor two. The Two-Grid al- 
gorithm performs the best with a gain of at least 
a factor 6 over SUOM and more than an order 
of magnitude over CGNE. This situation repeats 
itself for a different gauge configuration which is 
not shown here for the lack of space. However, 
if the projection of small eigenvalues is used the 
gain over SUOM/CR should be smaller since the 
Two-Grid algorithm is much less intensive in the 
application of the overlap operator. It is exactly 
the purpose of this comparison to make clear this 
feature of the Two-Grid algorithm. Finally, it is 
(not) surprising that SUOM and CR perform sim- 
ilarly: CR can be shown to be an efficient method 
for normal matrices. Since it is easier to imple- 
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Figure 1. Norm of the residual as a function of the 
number of Dw(M) multiplications for different 
algorithms for M = —1.8 

mcnt CR is more appealing than other Krylov 
solvers. 

The results of this work suggest that the full 
multigrid algorithm along the fifth dimension 
should be the next method to be explored. 
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